Three New Corpora at the Bavarian Archive for Speech Signals - and a First Step Towards Distributed Web-Based Recording
نویسندگان
چکیده
The Bavarian Archive for Speech Signals has released three new speech corpora for both industrial and academic use: a) Hempels Sofa contains recordings of up to 60 seconds of non-scripted telephone speech, b) ZipTel is a corpus with telephone speech covering postal addresses and telephone numbers from a real world application, and c) RVG-J, an extension of the original Regional Variants of German corpus with juvenile speakers. All three corpora were transcribed orthographically according to the SpeechDat annotation guidelines using the WWWTranscribe annotation software. Recently, BAS has begun to investigate performing large-scale audio recordings via the web, and RVG-J has become the testbed for this type of recording.
منابع مشابه
The SmartWeb Corpora: Multimodal Access to the Web in Natural Environments
As a result from the German SmartWeb project three speech corpora, one of them multimodal, have been published by the Bavarian Archive for Speech Signals (BAS). They contain speech and video signals from human–machine interactions in real indoor and outdoor environments. The scenarios for these corpora are a typicial handheld PDA interaction (SHC), an interaction on a running motorcycle (SMC) a...
متن کاملPhonemic Segmentation and Labelling using the MAUS Technique
We describe the pronunciation model of the automatic segmentation technique MAUS based on a data-driven Markov process and a new evaluation measure for phonemic transcripts relative symmetric accuracy; results are given for the MAUS segmentation and labelling on German dialog speech. MAUS is currently distributed as a freeware package by the Bavarian Archive for Speech Signals and will also be ...
متن کاملWikispeech - a content management system for speech databases
In this paper we describe WikiSpeech, a content management system for the web-based creation of speech databases for the development of spoken language technology and basic research. Its main features are full support for the typical recording, annotation and project administration workflow, easy editing of the speech content, plus a fully localizable user interface. For the creation of a new s...
متن کاملWeb-Based Speech Data Collection and Annotation
The WWW is a ubiquitous, mature communication infrastructure for business and scientific information interchange. Since 1997, the Bavarian Archive for SpeechSignals (BAS) has been developing and using web-based annotation tools for large-scale speech databases. Recently it has developed an application for recording speech via the WWW. Both the annotation and the recording tools are now integrat...
متن کاملPercyConfigurator - Perception Experiments as a Service
PercyConfigurator is an experiment editor that eliminates the need for programming; the experiment definition and content are simply dropped onto the PercyConfigurator web page for interactive editing and testing. When the editing is done, the experiment definition and content are uploaded to the server. The server returns a link to the experiment which is then distributed to potential particip...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002